 Author: IronFelix, 23.01.2009

  mailto: IronFelix@inbox.ru

  Table of contents
-----------------------
 
 1. Classes basics.
 2. Classes inheritance.
 3. Dynamic object construction and destruction.
 4. Abstract classes.
 5. Interfaces.
 6. Optimization.
 7. Configuration.
 8. Details.

-----------------------

  This file is usage description of macroses for object oriented programming
 in FASM. Macroses implementation is not described here.

  So, this macro set allows declaring classes and implementing them. Let me start
 with a brief of what you can do with it. Class can consists of the following:
  - fields
  - static functions
  - virtual functions (as a case of them - abstract functions)
  - interfaces (abstract classes, with or without IID)
  Class can have only one direct ancestor or may not have it at all, of course.
  There are a set of macroses which helps calling class functions of any kind. 
 It should be noticed, that all classes functions assumed to have "stdcall" 
 calling convention.
  As for fields - class is declared as "struc", so you can access them as you
 do it with ordinal structures. Here are some details, which will be described 
 later. 
  Macro set automatically generate data and code for classes to be
 implemented and work - virtual function tables, interfaces function redirectors,
 set of used interface IID's and additional data for implementing "QueryInterface"
 functions to decrease code size. 
  In macro syntax description all in [] brackets is optional, all in <> - placeholders
 for names. 
  Now, let start with basics.


1.   Classes basics.
  
  Let start with macroses, that used to declare class. They are: "class" to start 
 declaration, "endclass" to finish declaration. To declare a field - use macro "field",
 static function - "function", virtual function - "virtual_function".
  "Class" macro has the following syntax:
  
   class <Name> [ : <AncestorName> ]

 where <Name> is a class name, and <AncestorName> is class ancestor name.
  "endclass" has no any special syntax, it just finihes class declaration. Each
 class declaration must be finished with this macro.
  "Field" macro has one of the following syntax:

  field  <FieldName>  <Type>  [<InitValue>]
  field  <FieldName> : <Type> [= <InitValue>]
  
 where <FieldName> is name of the field, <Type> is a type (can be one of the ordinal FASM
 types or predefined structures) and <InitValue> is initialization value for field and 
 depends on field type. <InitValue> is only used in static class instances, as for dynamic
 instances - all initialization is performed by constructors. 
  "function" and "virtual_function" macro both have similar syntax:

  function          <FunctionName> [ = <EntryPointName>]
  virtual_function  <FunctionName> [ = <EntryPointName>]

 where <FunctionName> is function name and <EntryPointName> is optional entry point name.
 Without <EntryPointName> function should be implemented with name <ClassName>.<FunctionName>.

  Continue with simple class example. It will have field "A", static function 
 "SimpleFunction" and virtual function "VirtFunction",

   class MyClass

    field        A  :   DWORD  =   10
        
    function  SimpleFunction

    virtual_function  VirtFunction
     
   endclass
 
  By default, function names are formed as <ClassName>.<FunctionName> to get its 
 implementation, so in previous example you should implement "MyClass.SimpleFunction"
 and "MyClass.VirtFunction" (there must be a labels or a constants with such names defined
 in code). 
  You also can point function to implementation with another name. To do so, just place 
 "= <EntryPointName>" after function declaration. For example:

   class MyClass

    field  A  dd ?

    function SimpleFunction = MyFunc

    virtual_function VirtFunction = MyVFunc

    function Func1 

   endclass

  In this case function "MyClass.SimpleFunction" will be "MyFunc", "MyClass.VirtFunction"
 will be "MyVFunc" and "MyClass.Func1" should be defined as described above.
  There are special case with "= <EntryPointName>" in case of virtual finctions - it is
 abstract functions. To declare them you should place " = 0" after virtual function 
 declaration. For example:
  
   class AbsClass

    virtual_function  Abs = 0

   endclass
     
  Class "AbsClass" will now have abstract function "AbsClass.Abs". There is a special
 virtual function in class - destructor, which can also be abstract. To declare it use 
 macro "destructor" (its has the same syntax as other function macroses):

   class DestructClass

    destructor  Destroy

   endclass
    
  You should use same names for all destructors in one classes branch.
  You can declare static class instances and pointers to class objects. As it was mentioned
 earlier, class is a structure, so to declare its static instance just do as in case of
 ordinal variable, but you must implement all virtual functions of class MyClass to
 get instance defined:

   MyObj MyClass
 
  To declare a pointer to class instance, use "lp" or "P" prefix to class name, and you can
 initialize it as ordinal "dd" variable:

   pMyObj  PMyClass   
   pMyObj2 lpMyClass  NULL

  To get value of static object field just do it as in case of structure, for example:

   mov eax,[MyObj.A]
  
  In case of pointer you shoud work via register with <ClassName>.<FieldName> offset:

   mov eax,[pMyObj]
   mov ecx,[eax + MyClass.A]
 
  To call objects and classes functions use macro "objfcall" and "clsfcall". 
  "objfcall" has one of the following syntax:

   objfcall <ObjectName>,<Function> [,<parameters>]
   objfcall <ObjectName>-><Function> [ ( [<parameters>] ) ]

 where <ObjectName> is a name of class static instance or pointer, <Function> is a name of
 static or virtual function, <parameters> are function parameters separated with 
 commas, as in case of "invoke" and similar macro.   
  "clsfcall" has one of the following syntax:

   clsfcall <ClassName>,<Function>[,<ObjPointer> [, <parameters>] ]
   clsfcall <ClassName>-><Function> [ ( [<ObjPointer> [, <parameters>] ] ) ]
   clsfcall <ClassName>(<ObjPointer>)-><Function>( [<parameters>] )
   clsfcall (<ClassName>)<ObjPointer>-><Function>( [<parameters>] )

 where <ClassName> is a class name, <Function> is a name of static or virtual function, 
 <ObjPointer> - class object address, <parameters> are function parameters separated with 
 commas. 
  The difference between "objfcall" and "clsfcall" is in following: if you have class 
 typed variable (instance or pointer to instance) you may use both macroses, but if you have
 untyped pointer, for example address in register or in memory, pointed via register, you
 should use "clsfcall" macro only. Show example:

   objfcall MyObj->SimpleFunction()       ; calling static SimpleFunction with no parameters
   objfcall pMyObj->VirtFunction(0,1)     ; calling virtual VirtFunction with 2 parameters
   clsfcall MyClass->SimpleFunction(eax)  ; calling SimpleFunction of MyClass object with its 
                                          ; address in EAX
   clsfcall MyClass(eax)->SimpleFunction(); the same 
   clsfcall (MyClass)eax->SimpleFunction(); the same
   ; simple syntax
   objfcall MyObj,SimpleFunction
   clsfcall MyClass,SimpleFunction,[eax + 4] ; pointer to object is in memory pointed by  
                                             ; EAX with offset 4

  In each case macro automatically selects the type of call needed - direct call in case of
 static function or indirect via vtable in case of virtual or abstract function.
  Now let continue with inheritance.


2.   Classes inheritance.

  As I have mentioned earlier, in my macro implementation class can has only one direct
 ancestor. Let look at its processing. Start with example:

   class Base
    
    field   Count    dd   ?
    
    function GetCount
   
    virtual_function SetCount

   endclass   

 where class Base consists of field Count and two functions. Suppose we have its descendant:

   class Descendant : Base

    field  Count :  DWORD
 
    function GetCount

    virtual_function SetCount

   endclass 

 which has the same declarations. And now, suppose we have static instance of both classes,
 (you should implement functions of both classes of course):

   BaseObj  Base
   DescObj  Descendant 
 
  Let me describe what you have got after that.
  In class Descend will be 2 fields of type "dd", static function GetCount and 
 virtual function SetCount. You can get access to its ancestor data and static
 function. To do so, you should use "Base" class name with class instance or class name:

  mov eax,[DescObj.Base.Count] ; get ancestor Count
  mov eax,[DescObj.Count]      ; get own Count

  objfcall DescObj->GetCount()      ; call own GetCount
  objfcall DescObj->Base.GetCount() ; call Base->GetCount with DescObj object

  ; the same through address in EDX
  mov eax,[edx + Descendant.Base.Count]
  mov eax,[edx + Descendant.Count]
  clsfcall Descendant->GetCount(edx)
  clsfcall Descendant->Base.GetCount(edx)

  So, in case fields and static functions names are equal in ancestor and descendant 
 the last will got its own ones and ability to use ancestor ones. As for virtual functions
 they are replaced by descendant. In earlier example 2 virtual function tables will be 
 generated, one for Base class, and one for Descendant. Each table will hold a pointer
 to appropriate function, and BaseObj with DescObj will have a pointer to its classes
 vtables at offset 0. To summarize, BaseObj will be 8 bytes in size (vtable pointer + 
 Count), and DescObj will be 12 bytes in size (vtable pointer + Base.Count + Count).
  If names are not equal, there is no need to use ancestor class name addition:

   class A

    field  Count  dd ?

    function GetCount

   endclass

   class B : A

    field Index   dd  ?

    function Insert

   endclass      
 
   BObj B
   
   mov eax,[BObj.Count]      ; just use it as own
   objfcall BObj->GetCount() ; function became own too 
  
  Here comes two additional macroses that can be useful with inheritance. They are
 "inhcall" and "inhjump". First one makes call to ancestor function, if any,
 second one - makes jump to that function. 
  "inhcall" has one of the following syntax:
  
   inhcall <ClassName>,<Function>[ ,<parameters> ]
   inhcall <ClassName>-><Function> [( [<parameters>] )]

 where <ClassName> is name of the class, whose ancestor function should be used,
 <Function> is a function name (static or virtual), <parameters> are function
 parameters.
  "inhjump" has one of the following syntax:

   inhjump <ClassName>,<Function>,ParamCount
   inhjump <ClassName>-><Function> [()] : ParamCount    

 where <ClassName> is name of the class, whose ancestor function should be used,
 <Function> is a function name (static or virtual), ParamCount is parameters 
 count (4 bytes in size) of function, in which "inhjump" is used. This is done  
 because in case of absence ancestor function with name <Function>, macro just insert
 "retn ParamCount*4" instruction for proper exit. If ancestor function exists
 macro insert "jmp" to this function without any return - it is not needed in this 
 case.
  As for applicable usage of this macroses - see next chapter.


3.   Dynamic object construction and destruction.

  Of course, most common usage of objects can not be imagined without its dynamic
 construction and destruction. So, let me describe macroses for this features 
 implementation. First, macro for automatic constructor implementation - 
 "implement_constructor". It has one of the following syntax:
  
   implement_constructor <ClassName>, <ConstructorName>, <ParamCount>, [<InitFunction>], [<FieldsData>]
   implement_constructor <ClassName>[-> <ConstructorName>] ([<FieldsData>]) : ParamCount [ >> <InitFunction>] 

 where <ClassName> is name of the class, whose constructor is implemented, 
 <ConstructorName> - is name of the constructor function (if not specified - 
 the default name "Create" is used) and as a result you will get 
 <ClassName>.<ConstructorName> function defined, ParamCount is number of 4-bytes parameters
 passed to constructor, <InitFunction> - name of initialization function in
 class, which is executed after basic object creation and initialization is
 performed. As for <FieldsData> - it is initialization data for fields and has
 the following form:
  
    <FieldName1> = Value1, <FieldName2> = Value2, ... and so on 

 where <FieldNames> are class fields names and <Values> are constants,
 which will be stored to that fields.  
  Let me say some words about what this macro actually do. Based on class name
 it inserts object memory allocation code, checks if resulting pointer is zero
 and if it is so - jumps to "retn ParamCount*4" instruction. If it is not,
 special inner macro "dynamic_init_vptr" is used to initialize object with
 pointers to vtables (in case of using interfaces, there will be not only one
 such pointer, but it will be described later), then do initialization according
 to <FieldsData> and jumps to <InitFunction>, if any specified.
  Now its time to some example:

   class Base 

    field  RefCount   dd    ?

    function Create
   
   endclass   

   implement_constructor Base(RefCount = 1) : 0

  In this example no any initialization function specified, and initialization is done
 in constructor - field "RefCount" of new object will be set to 1.
  Now more complex example (with macro "proc" from FASM package):

   class Base2

    field RefCount  dd   ?
    field List      dd   ?

    function Create
    function Init

   endclass
   
   ; implementing construtor, which jumps to initialization  
   implement_constructor Base->Create(RefCount = 1) : 0 >> Init

   ; initialization function
   proc Base2.Init

    ; here in EAX will be the address of object instance memory ("this" pointer)
    ; you can save it to stack or not
    
    ; performes List creation here

    ;  ... 
   
    ; !!! on exit EAX must contain "this" pointer
    ret
   endp
 
  In this example initialization function "Init" is used. On enter, stack will contain
 parameters for constructor and EAX will hold address of object instance memory 
 (it is "this" pointer). On exit from initializator EAX must be the same as on enter.
  Why do I use these initialization functions? I do it because of possible "inhcall" in
 descendants. In such case you can not call ancestor's "Create", because it contains
 memory allocation and basic initialization code of ancestor object. So, to get 
 descendant object properly initialized, you should call ancestor "Init". 
 Let me show it in next example:

   class Base

    field  RefCount   dd  ?
    
    function Init
    function CreateObject

    ; some functions
    ; ...

   endclass  

   class Base2 : Base

    field List  dd   ?

    function Init
    function Create

    ; some functions
    ; ...

   endclass

   ; implement constructor of "Base"
   implement_constructor Base->CreateObject() : 0 >> Init

   proc Base.Init
    
    ; can use macro "inhcall" here, as it will no place any code 
    ; if there is no ancestor or function
    inhcall Base->Init()

    mov [eax + Base.RefCount], 1 ; initialize our RefCount
    ret ; returns
   endp

   ; implement constructor of Base2
   implement_constructor Base2->Create() : 0 >> Init

   proc Base2.Init
    
    ; here we can call ancestor initializator
    ; it will actually place "call Base.Init" code 
    inhcall Base2->Init()
    
    ; after "inhcall" we got RefCount initialized
    ; perform creation of List

    ; ...
 
    ret
   endp

  This example shows usage of "inhcall" macro and using initialization 
 functions with constructors. Of course, if there is no need in initialization
 functions you may not use them at all, and do all things in constructor, as
 it can decrease code size. In this case "implement_constructor" can't be 
 used in all cases, so let me describe how to implement constructor manually 
 (some "closer look"):

    class Base

     field  RefCount   dd   ?
     field  List       dd   ?
 
     function Create
 
     ; some functions
     ...

    endclass      

    ; implement constructor manually
    proc Base.Create
     
     ; first, allocate memory for object instance
     ; for example, use HeapAlloc
     ; size of class instance is set in constant "sizeof.Base"
     ; by class declaration
     invoke HeapAlloc,[ProcessHeap],HEAP_ZERO_MEMORY,sizeof.Base
     test eax,eax
     jz .exit ; if no memory - just exit
     
     ; initialize pointers to vtables, if any
     ; if there is no such tables - no code will be inserted
     ; just pass our class name to "dynamic_init_vptr" - it will
     ; done all vptrs initialiation of new object
     dynamic_init_vptr Base

     ; initialize our RefCount
     mov [eax + Base.RefCount],1
     ; create List

     ; ...

    .exit:
     ret
    endp

  But in case of this example it is assumed that there will be no any
 descendant of "Base". Or descendant must repeat ancestor initialization 
 code in own constructor - this is not good approach.   
  Let me say some words about implementing multiply constructors for
 class. Suppose class should have one constructor which takes pointer
 to structure or an object to get initialized, and another gets set of
 parameters. In this case you can still use "implement_constructor"
 macro as in previous examples, but you will get multiply code for memory 
 allocation and basic object initialization, but the only difference is 
 actually in initialization functions. To use one construction basic
 code for all contructors of a class use macro "implement_constructor"
 with "constructor_redirector". In this case "implement_constructor" 
 will have memory allocation code and vptrs initialization, and
 multiply "constructor_redirector" will tell that constructor which
 initialization function to use. 
  "implement_constructor" now should have "?" as the <InitFunction> and,
 if parameters count for initialization functions differs, as <ParamCount> 
 too. It must be not called directly, only via functions defining by
 "constructor_redirector" macro, which will be actual constructors.
  "constructor_redirector" has one of the following syntax:

   constructor_redirector <ClassName>, <RedirName>, <ConstrName>, [ParamCount], <InitFunction>
   constructor_redirector (<ClassName>-><RedirName> [: ParamCount] >> <InitFunction>) -> <ConstrName>
  
 where <ClassName> is name of the class, for which constructor is implemented,
 <RedirName> is name of redirector (actually - constructor, which will be used),
 <ConstrName> is the name of indirect constructor (defining by "implement_constructor"),
 ParamCount is 4-bytes parameters count and <InitFunction> is name of appropriate class
 initialization function. Now we are ready for example:

   class Point3D 

    field  X   dd  ?
    field  Y   dd  ?
    field  Z   dd  ?

    function CreateFromObj
    function CreateFromParams

    function InitFromObj
    function InitFromParams

    ; other functions
    ; ...
    
   endclass  

   ; implement indirect constructor
   ; it will take all "?" data from redirectors
   ; note, that "BasicCreate" is not declared in class "Point3D"
   ; because it is not necessary
   implement_constructor Point3D->BasicCreate() : ? >> ?

   ; implementing "CreateFromObj" constructor
   ; it will use "InitFromObj" initialization function
   ; which takes 1 parameter (pointer to another object)
   ; and redirect call to "BasicCreate"
   constructor_redirector (Point3D->CreateFromObj : 1 >> InitFromObj) -> BasicCreate

   ; the same for "CreateFromParams" with appropriate values
   ; it will take 3 parameters
   constructor_redirector (Point3D->CreateFromParams : 3 >> InitFromParams) -> BasicCreate
  
   ; the initialization code

   proc Point3D.InitFromObj lpObj

    ; initialize from object
    mov edx,[lpObj]
    mov ecx,[edx + Point3D.X]
    mov [eax + Point3D.X],ecx
    mov ecx,[edx + Point3D.Y]
    mov [eax + Point3D.Y],ecx
    mov ecx,[edx + Point3D.Z]
    mov [eax + Point3D.Z],ecx
    ret 
   endp

   proc Point3D.InitFromParams X,Y,Z

    ; init from stack parameters
    mov ecx,[X]
    mov [eax + Point3D.X],ecx
    mov ecx,[Y]
    mov [eax + Point3D.Y],ecx
    mov ecx,[Y]
    mov [eax + Point3D.Y],ecx
    ret
   endp

  Now you will get "Point3D.CreateFromObj" and "Point3D.CreateFromParams"
 constructors, and you can also use appropriate initialization functions
 in descendants via "inhcall" macro. And you will get smaller code size
 in case of more than 1 constructor with such implementation, even if 
 object has no vtable pointers.
  Of course, constructor implementation depends on situation, and you 
 are free to implement it according to this to decrease your code size.
  Now, some words about how such constructors can be called. It is done
 by 2 macroses: "clsfcall" and "constrcall". The last is introduced only
 for convinience - you can easily find it in code using search if it is
 needed to find point, where object is creating. "clsfcall" syntax has 
 already been described, so "constrcall" has one of the following syntax:
 
   constrcall <ClassName>,<ConstrName> [, <parameters>]
   constrcall <ClassName>-><ConstrName> [( [<parameters>] )]
 
 where <ClassName> is class name whose object is constructed, <ConstrName> is  
 name of an appropriate constructor and <parameters> are optional parameters
 to that constructor. 
  So calling our Point3D constructors can be look like this:

   ; using clsfcall
   clsfcall Point3D->CreateFromObj([lpObj])
   ; using constrcall
   constrcall Point3D->CreateFromParams([X],[Y],[Z])
   ; it is possible to use standard "stdcall" macro too
   stdcall Point3D.CreateFromObj, [lpObj]

  After one of such calls you will get new initialized Point3D object.
  It should be noticed, that to use "clsfcall" and "constrcall" macro
 constructors must be declared as static functions using "function"
 macro in class. It should be noticed also, that you can not create
 instance of object (statically or dynamically), whose class has any 
 abstract functions - you will get an error message with appropriate
 error description.  
  Now we know how to construct the object, but what about destruction? As
 I have mentioned earlier, there can be "destructor" in class. In my
 implementation I have used Object Pascal approach - destructor is a virtual
 function, whose offset in vtable = -4. So it is 4 bytes upper than vtable
 actually starts. This makes destructor not to be intersected with other
 virtual functions, and fixed offset let use one code for all objects.
  As for memory deallocation routine - it can be placed to destructor, but
 it is not good approach I think, because in this case you should write
 uninitialization functions to walk through ancestors uninit code (like
 constructor's "Init" functions). It is better to implement it like this:

   class Base

    field List   dd  ?

    destructor Destroy

    ; use common function for all objects with destructors
    function Free = FreeObjectWithDestructor 

    ; some constructors and other functions
    ; ... 

   endclass   
         
   class Desc : Base

    field pMem   dd   ?

    destructor Destroy ; overriding destructor
    
    ; some constructors and other functions
    ; ... 

   endclass
     
   ; destructors code
   proc Base.Destroy this

    ; free List
   
    ; ... 

    leave ; ! only because "proc" makes stack frame
          ; and we have done all with stack here 
 
    ; use "inhjump" here 
    ; it actually place "retn 4" instruction, as we don't have
    ; ancestor
    inhjump Base->Destroy() : 1 
   endp
   
   proc Desc.Destroy this

    ; free pMem
    mov eax,[this]
    invoke HeapFree,[ProcessHeap],0,[eax + Desc.pMem]   

    leave ; ! only because "proc" makes stack frame
          ; and we have done all with stack here 

    ; using "inhjump" will cause
    ; "jmp Base.Destroy" code to be inserted
    ; no any "retn" needed here
    inhjump Desc->Destroy() : 1
   endp

   ; common function for all objects with destructors
   proc FreeObjectWithDestructor lpObj

    mov eax,[lpObj]
    test eax,eax
    jz .exit
    push eax ; -> for memory deallocation
    ; call destructor
    push eax
    mov eax,[eax]
    call near [eax - 4]
    ; free instance memory
    invoke HeapFree,[ProcessHeap],0 ; <- "this" is in stack already
   .exit:
    ret
   endp
   
  This example shows an approach, where destructor is actually used as
 uninitialization function. Each destructor use macro "inhjump" at the
 exit point too. This is done because any destructor got only one parameter
 "this", so there is no need to use "push [this]/call/retn" sequence in code,
 simple jump to ancestor destructor will work the same and will be smaller
 in size. The last destructor in sequence performs correct exit in this case, 
 as it has "retn 4" instruction at the exit point.
  So to free our Base and Desc objects we can do:  
   
   pBase PBase ; for example

   ; using defined object
   objfcall pBase->Free()
   ; using address in EAX
   clsfcall Desc(eax)->Free() 

  Macro "inhjump" can be used in other functions too, but you should
 use it carefully - functions must have same parameters count and you
 should remember also, that there will be no code executed after 
 "jump <Ancestor>.<Function>" in function, where "inhjump" is used.
  This is not true for "inhcall" macro, which performs a "call" to
 ancestor function and passes parameters to stack before this call.
  Now let continue with abstract classes.


4.   Abstract classes.

  Abstract classes in my imlementation are classes which can only 
 have abstract functions - no fields and no static or virtual functions
 (as in object-oriented programming). 
 For example:

   class AbstractCalc

    virtual_function Add = 0
    virtual_function Multiply  = 0

   endclass 

 class "AbstractCalc" has 2 functions which are abstract. But where
 can be used such classes? These classes can be used as common classes for
 caller and implementor. Each function of such classes describes a kind of 
 "rule", according to which actual function is called. For example, we
 have implemented some calculation in a DLL:
  
   class Calc : AbstractCalc

    ; some fields
    ; ...

    virtual_function Add 
    virtual_function Multiply 

    ; some functions
    ; ...
   
   endclass
 
   proc Calc.Add this,A,B

    ; some implementation
    ; ...

    ret
   endp

   proc Calc.Multiply this,A,B

    ; some implementation
    ; ...

    ret
   endp

  Now we should get only pointer to any implementation, which have done 
 according to our "AbstractCalc" to work with it, as we know how to call 
 methods of this implementation. That's why it is possible to declare
 only pointers to abstract classes, instances in this case are not allowed
 (we actually have no code in caller).
 For example:

   pCalc PAbstractCalc

   ; suppose we have got this pointer somehow
   ; for example with possible create of "Calc" object
   ; in some DLL by function "CreateCalcObject"
   invoke CreateCalcObject 
   mov [pCalc],eax  

   ; now we can call pCalc method
   objfcall pCalc->Add([A],[B])
   objfcall pCalc->Multiply([A],[B])

  So abstract classes can be used to link separated implementation with
 its caller. Note, that abstract function must be overridden by virtual
 function with same name.  
  Now let go to interfaces.


5.   Interfaces.

  In my classes implementation interfaces are abstract classes with IID's,
 just like in COM model. It has "uuid" macro with GIUD in class declaration,
 for example this:

   class IUnknown

    uuid [00000000-0000-0000-C000-000000000046]
    ; or -  uuid 00000000-0000-0000-C000-000000000046 

    virtual_function QueryInterface = 0
    virtual_function AddRef         = 0
    virtual_function Release        = 0

   endclass 

 declares interface IUnknown - basic interface for all COM objects. Its IID
 can be accessed via "IID_IUnknown" name. That is, any "uuid" declaration
 declares IID_<InterfaceName> label which contains an address of IID bytes in
 memory (data or code - depends on where you place it). 
  As in COM model, there could be many interfaces implemented in one
 class. To do it use "implement_interface" macro, which has the following
 syntax:

   implement_interface <InterfaceList>

 where <InterfaceList> is a list of implemented interfaces in the following
 form:
  
   <InterfaceName> [(<attributes>)], ...

 where <InterfaceName> is a name of interface and <attributes> are optional 
 attributes, separated with commas. Attributes will be described in more detailes
 later, and now let continue with declaration and implementation of interfaces.
  In case of implementing interfaces which are in the same class branch it is
 enough to inherit appropriate interface from this branch directly, without
 using "implement_interface". For example:

   class IClassFactory : IUnknown

    uuid [00000001-0000-0000-C000-000000000046]

    virtual_function CreateInstance = 0
    virtual_function LockServer     = 0

   endclass 

   ; implements both IUnknown and IClassFactory
   class TClassFactory : IClassFactory

    field  RefCount  dd   ?

    ; from IUnknown
    virtual_function QueryInterface
    virtual_function AddRef        
    virtual_function Release       
    
    ; from IClassFactory
    virtual_function CreateInstance
    virtual_function LockServer    

   endclass 
 
   ; implements functions
   ; ...

  In this example we have declared class "TClassFactory" which implements
 two interfaces - IUnknown and IClassFactory. Let see how to implement 
 interfaces from different class branches:

   class ISequentialStream : IUnknown

    uuid [0c733a30-2a1c-11ce-ade5-00aa0044773d]
 
    virtual_function Read  = 0
    virtual_function Write = 0

   endclass   

   class IMyPicture : IUnknown

    uuid [5FC02269-1E89-494D-A7DF-A25A7E180BFB]
   
    virtual_function Draw = 0

   endclass

   ; implements both interfaces
   class TSeqStreamPicture : ISequentialStream

    field RefCount   dd   ?

    implement_interface IMyPicture

    ; from IUnknown
    virtual_function QueryInterface
    virtual_function AddRef        
    virtual_function Release

    ; from ISequentialStream       
    virtual_function Read  
    virtual_function Write 

    ; from IMyPicture
    virtual_function Draw

   endclass

   ; implements functions
   ; ...

  Here we have a class which actually implements 3 interfaces: IUnknown,
 ISequentialStream and IMyPicture. Let me say some words about how it
 is implemented. When you implements interface via "implement_interface"
 a new field is added into class - pointer to interface vtable. It
 is done not in all cases due to optimization - will be described
 later. This field is inserted at where "implement_interface" is used.
 So, if you have any fields being declared before "implement_interface",
 implemented interfaces will be inserted after these fields. Now what
 interface vtable is? It works like an ordinal vtable except one thing:
 it contains not pointers to actual functions, but pointers to redirectors
 to that functions. This is done, because when you call interface function 
 via abstract class, its "this" will not point to object "this", but to 
 field, where interface vptr resides. So, to properly call function, implemented
 by a class-implementor of multiply interfaces, there must be a redirector,
 which will correct "this" to be pointed to an object and call object function.
  There are 2 types of redirectors in my macro implementation, and both of them
 will be described later in optimization issues.
  Now continue with AddRef and Release functions. The first increments reference 
 counter of object, and the second decrements it and when counter is zero - 
 destroying an object. In my macro package such functions are already implemented
 and can be used in any class with the following conditions: reference counter
 must be strict after object vptr, at offset 4 from object start. And there
 must be a destructor in class, even if it does nothing. See next example:

   class TClassFactory : IClassFactory

    field RefCount dd  ? ; declare reference counter just after vptr
                         ; so - it must be the first field in class

    ; implements destructor as a stub with 1 parameter in stack
    ; if there should be real destructor - just implement
    ; it as needed
    destructor Destroy = FuncStub1

    ; from IUnknown
    virtual_function QueryInterface
    ; take ready implementation of next 2 functions
    virtual_function AddRef         = IUnknown_AddRef         
    virtual_function Release        = IUnknown_Release
    
    ; from IClassFactory
    virtual_function CreateInstance
    virtual_function LockServer    

   endclass         

   ; defining a stub
   if used FuncStub1
    FuncStub1: retn 4 ; simply returns
   end if 

   ; implements other functions
   ; ...

  So, you may not implement AddRef and Release for each class - just
 use its ready implementations. It should be noticed, that these functions
 are configurable via precompiler directives, but it will be described later
 in configuration issues.
  Now let continue with QueryInterface functions. This function returns 
 appropriate pointer to implemented interface, that is - address within the
 object, at which interface vptr resides (it can be interface vptr or pointer 
 to object vtable). So, function perfomes check of all interfaces IID's, which
 object implements, and indicates success or failure. In case of success it 
 returns zero in EAX (S_OK) and appropriate pointer to implemented interface, 
 which is AddRef'ed before QueryInterface returns, in case of failure - EAX 
 contains E_NOINTERFACE error code, and interface pointer is set to zero.
  In most cases this function can be implemented automatically via special
 macro "implement_QueryInterface" with <ClassName> as a parameter, where 
 <ClassName> is name of the class-implementor of interfaces. But there
 is a special case, where you should add your own code - implementing objects,
 which aggregates another objects. In this case in addition to standard 
 IID's checking there must be the code in QueryInterface which checks object's 
 aggregates for interfaces, that object doesn't implement itself. I will describe
 all of it in details chapter, and now final example of partially implemented
 TClassFactory via my macro package:
  
   class TClassFactory : IClassFactory

    field RefCount dd  ? ; declare reference counter just after vptr
                         ; so - it must be the first field in class

    ; implements destructor as a stub with 1 parameter in stack
    ; if there should be real destructor - just implement
    ; it as needed
    destructor Destroy = FuncStub1

    ; from IUnknown
    virtual_function QueryInterface
    ; take ready implementation of next 2 functions
    virtual_function AddRef         = IUnknown_AddRef         
    virtual_function Release        = IUnknown_Release
    
    ; from IClassFactory
    virtual_function CreateInstance
    virtual_function LockServer    

   endclass         

   ; defining a stub
   if used FuncStub1
    FuncStub1: retn 4 ; simply returns
   end if 
     
   ; implementing QueryInterface
   implement_QueryInterface TClassFactory

   ; other functions
   ; ...

  Now let continue with optimization issues.


6.   Optimization.

  There are several optimization issues. First is an ability to declare
 both virtual and static function with same name in class. In this case 
 it is highly recommended that such functions point to the same entry points.
 Why I have done this thing? There is at least one reason for this from my 
 point of view - code for static function call is equal or 1 byte less than 
 code for virtual function call and as far as I know - little faster. 
 How we can use it? We can use it in classes implementation, which are designed 
 to be used via abstract classes.
 Let me show it in example:

   class IAbsList 

    virtual_function Add  = 0
    virtual_function Grow = 0  

   endclass  

   class TList : IAbsList

    ; possible fields
    field Count     : DWORD
    field Capacity  : DWORD
    field pMem      : DWORD
    
    ; declare functions as virtual to override all from IAbsList
    virtual_function Add  
    virtual_function Grow   

    ; declare Grow as static too for inner use
    function Grow

   endclass

   ; possible Add implementation
   proc TList.Add this

    ; omitted checking capacity
    ; suppose we should grow
    ; here we will get call to static function
    ; not to virtual 
    ; it is 1 byte less
    clsfcall TList([this])->Grow() 

    ret
   endp

   proc TList.Grow this

    ; some grow code

    ret
   endp

   ; a call via pointer to IAbsList will be virtual in any case
   clsfcall IAbsList(eax)->Grow()

  So, if you have an ability to use this thing in your code - just do it.
 Of course, this can not be done in all cases, and you should remember, that
 such functions will be called as static in all following descendants of
 TList and via "objfcall" or "clsfcall" with "TList" and any descendant name 
 you can not get virtual function "Grow" call. It should be noticed that
 with using "objfcall" and "clsfcall" macro static function declaration is
 checked first, and only if there is no static function declaration 
 virtual one checking is performed.
  Now let continue with second optimization issue - vtables optimization.
 The main idea is in following: suppose we have a class with virtual functions,
 suppose that it has an ancestor with virtual functions and none of its
 functions are overridden in descendant class. In this case we can use
 descendant vtable with both of these classes. So if some vtable is
 equal to "low" part of other vtable - larger one can be used for both
 classes. It is true for all vtables, including interface's ones. Moreover
 it is interface's vtables that can be optimized more.
  To describe why it is so let me return to my words about 2 types of
 redirectors. First one is redirector to virtual function, and the second
 one is redirector to static function. Let me show an example of code for
 both ones:

   ; redirector to virtual function
   sub dword [esp + 4], 4 ; correct this
   ; call true object function
   mov eax,[esp + 4]
   mov eax,[eax]
   jmp near [eax + 8] ; suppose at offset 8 in vtable

   ; redirector to static function
   sub dword [esp + 4], 4 ; correct this
   jmp <ClassName>.<Function> ; jump to static function

  Both redirector is assumed to get address of interface vptr
 field in main class at offset 4 and call some function: virtual
 one calls function with address at offset 8 in vtable, static one 
 calls function <ClassName>.<Function>. Now - back to vtable opimization.
 As you may have noticed, redirector to virtual function can be used in
 several classes, it just must meet the following conditions: interface
 vptr should be at offset redirector is expected, and function offset in vtable
 must be as expected too. So if interface vtables consists of addresses
 of such redirectors - there can be easily done vtables optimization.
 It is possible in case of static redirectors too, but not so good,
 because in case of overriding static function, whose redirector is calling,
 it can not be used anymore - the new one is generated as function address
 has changed. As for virtual function redirector - it doesn't work
 with direct addresses, only with offsets, so for such redirector overriding 
 doesn't matter - it will still be used.
  Here we get to another optimization issue - redirectors optimization.
 The main idea - new redirector is generated only if there is no one
 already inserted in code. This is true for both types of redirectors.    
 So, each redirector is unique for all code.
  Next optimization issue - ability to use static functions to implement
 interfaces functions. This is the case that generates redirectors to
 static functions. Why I have done such ability? Why not simply get
 all implemented interfaces functions as virtual? Let me explain this.
 As you may have noticed, redirector to virtual function is almost 2
 times larger than the one to static function. You should remember this
 in case, when you possibly implements only one class with
 multiply interfaces implemented in your code. Using virtual function 
 redirectors will make your code larger. Continue with example:  

   ; suppose we have 4 interfaces declared
   class IHello : IUnknown

    uuid [B7FA51D5-8B4E-413C-8640-9CCDBD602E7D]

    virtual_function Hello     = 0
    virtual_function SayHello  = 0

   endclass
   
   class IBye : IUnknown

    uuid [B7FA51D5-8B4E-413C-8640-9CCDBD602E8D]

    virtual_function Bye     = 0

   endclass

   class IMessage : IUnknown

    uuid [B7FA51D5-8B4E-413C-8640-9CCDBD602E9D]

    virtual_function ShowMessage   = 0

   endclass 

   class IError : IUnknown

    uuid [B7FA51D5-8B4E-413C-8640-9CCDBD602E9D]

    virtual_function ShowError   = 0

   endclass 

  Now let calculate count of bytes of auxilary data and code to get 
 next class work:

  ; implement all 4 interfaces
  class TImplAll : IHello

   field  RefCount  dd  ?
  
   implement_interface IBye, IMessage, IError

   ; from IUnknown
   virtual_function QueryInterface
   virtual_function AddRef             
   virtual_function Release   

   ; from IHello
   virtual_function Hello   
   virtual_function SayHello   

   ; from IBye
   virtual_function Bye

   ; from IMessage
   virtual_function ShowMessage 

   ; from IError
   virtual_function ShowError

  endclass 

  As all functions are virtual we got 8*4=32 bytes vtable for class.
 As all interfaces must have their own vtables, we have: 4*4=16 bytes 
 for each "implement_interface"'d interface. The result is 16*3=48 bytes
 for 3 interface vtables. As all interfaces functions are redirected to 
 virtual functions of our implementor, we have 13+14+14+14 bytes for
 each "implement_interface"'d interface. The result is 55*3=165 bytes
 for redirectors code (each is unique in this case due to unique offsets).
 As a result we get 32 + 48 + 165 = 245 bytes of auxilary code and data.
 Now do the same for next class declaration:

  class TImplAll : IHello

   field  RefCount  dd  ?
  
   implement_interface IBye(static), IMessage(static),\
                       IError(static)

   ; from IUnknown
   virtual_function QueryInterface
   virtual_function AddRef             
   virtual_function Release   

   ; declare as static too
   function QueryInterface
   function AddRef             
   function Release   

   ; from IHello
   ; must be virtual, as we have abstract ancestor
   virtual_function Hello   
   virtual_function SayHello   

   ; from IBye
   function Bye

   ; from IMessage
   function ShowMessage
 
   ; from IError
   function ShowError

  endclass 

  In this example interface attribute "static" is used, which makes
 all implemented interfaces functions be redirected to static functions
 first, and if there is no static in class - virtual one is checked. 
 Performs our calculations. Size of class vtable is smaller, as we have  
 only 5 virtual function, and it is 5*4=20 bytes. Size of interfaces vtables 
 does not changed and is still 4*4*3=48 bytes. Size of redirector can be 7 
 or 10 bytes. It depends on where function to which it redirects placed (due to
 "jmp <label>" instruction size). Calculate both variants: 10*4*3 = 120 bytes 
 in bad case and 7*4*3=84 bytes in best case. As a result we get 20+48+120=188 
 bytes in bad case and 20+48+84=152 bytes in best case. Compare it with 245 bytes 
 of auxilary data and code in all cases using virtual functions redirectors 
 in our example.
  Using best redirector type depends on a situation: virtual functions redirector
 is preffered with many class branches with overriding virtual functions, as it is
 not affected by this and in addition to limited amount of such redirectors you
 will get interface vtables optimization if possible. But if you have only some 
 classes with multiply interfaces inmplemented the static function redirector is 
 preffered.
  So, to make your code smaller with same functionality it is better to keep
 redirectors and vtables in mind and use best approach in a particular situation.
  As you may have noticed TImplAll class has IHello as direct ancestor, and it
 is done because IHello has the largest number of abstract functions among
 implemented interfaces. In case of using static function redirectors it just
 exclude unnecessary ones. But in case of using virtual function redirectors
 you may not get any advantage in some cases.
  Now continue with next optimization issue - interfaces vptrs optimization.
 This optimization type helps to decrease class instance size with a little 
 decrease in code. 
 Let start with example:

   ; suppose we have 2 interfaces declared
   class IMessage : IUnknown 

    uuid [0C8EB214-2406-4251-90E5-52F5BB596343]

    virtual_function ShowMessage = 0

   endclass

   class IWarnMessage : IMessage

    uuid [0C8EB214-2406-4251-90E5-52F5BB596344]

    virtual_function ShowWarningMessage = 0

   endclass

   ; and we have 2 implementors  
   class TImpl : IMessage

    field RefCount   dd   ?

    ; from IUnknown
    virtual_function QueryInterface
    virtual_function AddRef             
    virtual_function Release   
   
    ; from IMessage
    virtual_function ShowMessage

   endclass
 
   class TImpl2 : TImpl

    implement_interface IWarnMessage

    ; from IWarnMessage
    virtual_function ShowWarningMessage

    ; should be overridden as new interface is implemented
    virtual_function QueryInterface 

   endclass 

  In this example we have 2 interfaces and 2 implementors. First implementor
 use IMessage as direct ancestor, and the second is ancestor of TImpl
 with IWarnMessage interface added. What optimization can we get here? 
 It is not necessary in such case to insert field with IWarnMessage vptr to
 TImpl2 because IWarnMessage ia an ancetor of already implemented interface.
 I mean that we can just replace vptr to IMessage implementation with
 vptr to IWarnMessage implementation in TImpl2. In our example we work via
 main vptr of object, so - no any IWarnMessage vprt field will be added to
 TImpl2, main vtable will be expanded with ShowWarningMessage function.
  This is true for all implemented interfaces and works as following: if
 there is any opportunity to expand main vtable - it will be done and no
 field with interface vptr will be added, if vtable can not be used, then 
 already implemented interfaces is checked and first possible field with 
 interface vptr will be replaced with a new one. And only when there is
 no any opportunity to replace existing interface vptr new field is added.
 Checking is performed from first "implement_interface"'d interface to last, 
 so result depends on interfaces sequence. This optimization is done 
 automatically in class declarations macroses.
  As a result you will get smaller size of class instance in memory and
 less size of vtables initialization code in "dynamic_init_vptr" macro in
 constructor.
  Next optimization issue is QueryInterface implementation optimization.
 In order to decrease bytes needed to implement this function I use one
 code and set of data tables for this code. In this case QueryInterface
 of some class is just a redirector to table search implementation with
 appropriate register values as parameters. In case of more than one
 QueryInterface functions needed to be implemented such approach can save bytes
 for implementation. But in case of one function - it depends on number
 of interfaces class implements. If this number is less than 3 then simple
 continuous checking is smaller, otherwise search via table without any 
 redirector is preffered. This optimization type is done in 
 "implement_QueryInterface" macro. But here is another optimization done 
 in case of multiply QueryInterfaces: data tables optimization. It is done 
 just the same as for vtables: if we have class that implements some 
 interfaces and it has a descendant with some more interfaces implemented - 
 descendant's table can be used for both these classes QueryInterfaces.
  Automatic implementation of QueryInterface in my macro set has some 
 limitations, but it will be described in details chapter. 
  And now let continue with classes configurations.


7.   Configuration.    
  
  There are some configuration macroses and "equ" names in classes macro 
 set. It helps not to search macro code in order to find and change some 
 parameters, but only change these configuration to get the same result. 
 Let me describe them. But first - modules description:

  "ClassesCore 1.6.inc" - the main code for classes to work, the core,
                       if it may be called so. It contains all the macroses 
                       used in classes declarations.

  "ClassesCfg.inc" - contains configuration macroses and names.

  "ClassesCode.inc" - contains executable code, which can be included in
                      appropriate place of your program.
 
  "ClassesMCode.inc" - contains executable code used internally in
                       classes macroses (should be included before main core).

  So, all but "ClassesCode.inc" should be included before main core, and
 moveable code - where you need it to be. 
  Now back to configuration. Main configuration parameters - alignments
 and ability to move data produced by classes core into data section. All of
 them are declared with "equ" directive, but some must be just "equ" to empty
 and other - to names. First - which are equ to something.

  "CLASSES_DATA_MOVEABLE" - allows all classes data to be inserted later,
                            (combines VTABLES_MOVEABLE, IIDS_MOVEABLE,
                             QIDATA_MOVEABLE).

  "VTABLE_ALIGN" - vtables alignment value.

  "QI_DATATABLE_ALIGN" - data tables for QueryInterface implementation alignment
                         value.

  "IID_ALIGN" - interfaces IID alignment value.

  According to those values appropriate data alignment will be done.
  Now parameters that "equ" to nothing:

  "VTABLES_MOVEABLE" - allows vtables to be moved from classes declaration to
                       other place, using macro "insert_vtables_data"

  "IIDS_MOVEABLE" - allows IIDs to be moved from classes to
                    other place, using macro "insert_iids_data"

  "QIDATA_MOVEABLE" - allows data tables for QueryInterface implementation
                      to be moved from classes declaration to
                      other place, using macro "insert_QI_data"

  "IUNKNOWN_INTERLOCKED" - use InterlockedIncrement(Decrement) functions in
                           common AddRef and Release implementation with
                           reference counters. If not declared - inc/dec 
                           instructions are used.

  "NOFREE_ON_RELEASE" - if declared, no memory deallocation done in
                        common Release implementation. By default
                        it is common Release that frees instance memory.

  And some special parameter, used basically in in-process COM servers 
 implementation:
 
  "GLOBAL_REFCOUNT" - if "equ" to some name - this name is used as name
                      of global reference counter, which will be processed in
                      common AddRef and Release implementations.
  
  In order to simplify moving classes data you can use macro 
 "enable_classes_data_moving" before class declarations (or declare
 CLASSES_DATA_MOVEABLE before includes) and "insert_classes_data" at the 
 place where you need data to be inserted.
  It should be noticed, that redirectors code will be at the place classes 
 are declared in any case. 
  There are 2 macroses contains memory allocation and deallocation routines
 in "ClassesMCode.inc" - "alloc_instance_memory" and "free_instance_memory". By 
 default it use heap functions to work with memory, but you are free to 
 use memory functions you preffer. These macroses are used in automatic
 implementations of constructors and functions where instance memory deallocates.    
  To perform vtables and QueryInterface data tables optimization use
 macro "optimize_classes_data" after all classes declarations, but before
 any code, that uses these data (constructors and QueryInterface implementors).  
  So, the preffered program structure looks like this:
    
   ; include main classes modules
   ; ------------------------------
   ; may use some configuration parameters here
   ; ...

   CLASSES_DATA_MOVEABLE equ ; to place all classes data later

   ; configuration module
   include "ClassesCfg.inc"

   ; or may use some configuration parameters here too
   ; ...

   ; here we can use "enable_classes_data_moving"
   ; instead of CLASSES_DATA_MOVEABLE
   ;enable_classes_data_moving

   ; include other classes modules
   ; suppose that "ClassesCode.inc" is included here too 
   include "ClassesMCode.inc"
   include "ClassesCore 1.6.inc"
   include "ClassesCode.inc" ; here, for example
   ; ---------------------------

   ; next - interfaces includes
   include "all_interfaces.inc"

   ; countinue with classes declarations
   include "all_classes.inc"

   ; use optimization after all classes declarations 
   optimize_classes_data

   ; include classes implementations and other code
   include "classes_impl.inc"
   include "program_code.inc"
  
   ; ...


   ; insert classes data in data section
   ; or at the end of code section
   section ".bss" data readable writeable
   insert_classes_data ; suppose here

   ; continue with other data
   align 4 ; because QIdata is of 5 bytes items
           ; so if you need rest data to be aligned - do it 
   ; ...  

  Now let me say some words about "ClassesCode.inc" functions. IUnknown_AddRef,
 IUnknown_Release and FreeObjectWithDestructor have been mentioned in chapters 
 3 and 5. As for MainTableQueryInterface - do not call this function directly,
 it is called via "implement_QUeryInterface" implementation, when needed. It
 is placed in this module just because it is moveable and can be anywhere in code.  
  Now let continue with details chapter.


8.   Details.

  This chapter is about some cases, that have not been yet described. First, due
 to interface vptrs optimization in class-implementor next class declarations are
 the same:

   class TIMpl

    implement_interface IStream, ISomeInterface

    ; class functions
    ;  ....

   endclass
   
   ; equal declaration
   class TImpl : IStream

    implement_interface ISomeInterface

    ; class functions
    ;  ....
 
   endclass

  It is so because "IStream" in this case can safely expand main vtable with
 its own functions. But if "TImpl" has an ancestor, whose virtual functions
 does not allow this optimization to be done, pointer to "IStream" vptr will
 be added as a field. It should be noticed here, that virtual functions checking
 depend on order of functions and their names.     
  Second is mentioned earlier automatic QueryInterface implementation limitations. 
 In order to decrease code size I use one code and data tables, which consists of
 an elements with pointer to IID and offset from object start. Pointer to IID in
 this case is always 4-bytes long, but offset can be different in size. In this version
 of macro set I left only 1 byte for this offset. That is why maximum offset value for
 interface vprt is 255 - if it is greater, program will not be complied. You should 
 remember it with descendants, in which are added some new interfaces. If no optimization
 can be performed, interface vptr will be added after all class fields, and its offset 
 possibly will be greater than value of 255. So, to avoid this, it is preffered to use classes
 with its instances size less than 255 bytes in memory, or add new interfaces in descendants 
 of such classes only when interface vptrs optimization is possible. And another limitation
 is: count of different implemented interfaces in class could not be greater than 256. 
 This is due to optimization too, and it can be reached very seldom, if possible at all.  
  Third detail is COM aggregation. There is some additions to "implement_QueryInterface" for
 it. This macro has actually one of the following syntax:
  
   implement_QueryInterface <ClassName>,<QIName>[,<E_NOINTERFACE_macro>]
   implement_QueryInterface <ClassName>[-><QIName>[()]] [>> <E_NOINTERFACE_macro>]      

 where <ClassName> is name of the class whose QueryInterface is implementing,
 <QIName> is the name of QueryInterface function (if omitted - "QueryInterface" is
 used) and <E_NOINTERFACE_macro> is name of macro with code for additional interface 
 checking. Now - what is all for? In order to implement object, that can be used as an
 aggregate, you should have two "IUnknown" implementations - one just redirects all calls
 to outer object, and other is true implementation of IUnknown functions. More information
 can be found in net, as for my code it is based on D.Box "Essential COM" book. That is
 why we should have ability to implement QueryInterface with name we need. Now let me
 show an example of aggregatable object with only that parts implemented which are
 concerning aggregation: 

   ; IUnknown declaration
   class IUnknown 

    uuid [00000000-0000-0000-C000-000000000046]

    virtual_function QueryInterface = 0
    virtual_function AddRef         = 0
    virtual_function Release        = 0 

   endclass

   ; this interface will be used as true IUnknown
   class IXUnknown 

    ; this "uuid" will not be placed as data
    uuid [00000000-0000-0000-C000-000000000046]

    virtual_function XQueryInterface = 0
    virtual_function XAddRef         = 0
    virtual_function XRelease        = 0 

   endclass

   ; some interface
   class IMessage : IUnknown

    uuid [5C8E5591-5AA5-47EA-B6EB-8FA7C8D7D3C7]   

    virtual_function ShowMessage = 0

   endclass

   ; interfaces implementor
   class TAggrMessage : IMessage

    field  RefCount   : DWORD ; reference counter

    ; mark as aggregatable and static redirected
    implement_interface IXUnknown(aggregatable + static)

    field  UnkOuter   dd    ? ; outer IUnknown

    function Create ; constructor
    ; destructor will be a stub here, as we should not
    ; free anything
    destructor Destroy = FuncStub1 

    ; from IUnknown
    ; implement it 
    virtual_function QueryInterface 
    virtual_function AddRef         
    virtual_function Release        
 
    ; from IMessage
    virtual_function ShowMessage 

    ; from IXUnknown
    ; standard implementations
    function XQueryInterface 
    function XAddRef         = IUnknown_AddRef
    function XRelease        = IUnknown_Release 

   endclass

   ; stub for destructor
   if used FuncStub1
    FuncStub1: retn 4
   end if

   ; constructor implementation  
   proc TAggrMessage.Create OuterUnk
    
    alloc_instance_mem sizeof.TAggrMessage
    test eax,eax
    jz .exit
    dynamic_init_vptr TAggrMessage 
    ; initialize RefCount and UnkOuter
    mov [eax + TAggrMessage.RefCount],1
    ; here we should check OuterUnk in order
    ; to work correct both as aggregate and separate object
    mov ecx,[OuterUnk]
    test ecx,ecx
    jnz .use_outer
    ; if no outer - use own
    ; each interface vptr field offset is formed as 
    ; "F"+<InterfaceName>
    lea ecx,[eax + TAggrMessage.FIXUnknown]
   .use_outer:
    ; use appropriate outer
    mov [eax + TAggrMessage.UnkOuter], ecx
   .exit:
    ret
   endp 

   ; implementation of non-delegate QueryInterface
   ; just use macro
   implement_QueryInterface TAggrMessage->XQueryInterface 

   ; IUnknown implementation - it is just redirectors
   ; that's why "proc" macro is not used here
   ; -------
   ; redirectors can be optimized and be in size not 47 bytes
   ; as it is here, but 32 with auxilary redirection code

   if used TAggrMessage.QueryInterface
    TAggrMessage.QueryInterface:
     mov edx,[esp + 4] ; this
     ; call QueryInterface of UnkOuter
     ; just jump there
     mov eax,[edx + TAggrMessage.UnkOuter]
     mov [esp + 4],eax ; replace "this" by UnkOuter
     mov eax,[eax]
     jmp near [eax] 
   end if

   if used TAggrMessage.AddRef
    TAggrMessage.AddRef:
     mov edx,[esp + 4] ; this
     ; call AddRef of UnkOuter
     ; just jump there
     mov eax,[edx + TAggrMessage.UnkOuter]
     mov [esp + 4],eax ; replace "this" by UnkOuter
     mov eax,[eax]
     jmp near [eax + 4] 
   end if

   if used TAggrMessage.Release
    TAggrMessage.Release:
     mov edx,[esp + 4] ; this
     ; call Release of UnkOuter
     ; just jump there
     mov eax,[edx + TAggrMessage.UnkOuter]
     mov [esp + 4],eax ; replace "this" by UnkOuter
     mov eax,[eax]
     jmp near [eax + 8] 
   end if
   ; -------
 
   ; some ShowMessage implementation
   ; ...

  In this example new interface attribute is introduced - "aggregatable". It marks interface
 to be first checked among others in QueryInterface, so offset of class-implementor's 
 IUnknown vptr will now be where such interface declared.
  Interface attributes are separated with "+" or "&" symbols. There is another one
 attribute "virtual", which is the default and makes virtual redirector function to be
 first checked for interface.
  Aggregatable object's ClassFactory->CreateInstance should use non-delegate implementation
 of object's IUnknown:

   proc TClassFactory.CreateInstance this, pUnkOuter, riid, ppvObj

    ; zero ppvObj
    xor eax,eax
    mov edx,[ppvObj]
    mov [edx],eax

    ; create object
    constrcall TAggrMessage->Create([pUnkOuter])
    test eax,eax
    jnz @F
    ; if no object created
    mov eax,E_OUTOFMEMORY
    jmp .exit
   @@:
    ; continue with new object
    ; use non-delegate QueryInterface
    push eax ; -> save pointer to object
    clsfcall TAggrMessage(eax)->XQueryInterface([riid],[ppvObj])
    pop edx ; <- restore pointer to object
    push eax ; -> save HRESULT
    ; call inner Release
    clsfcall TAggrMessage(edx)->XRelease()
    pop eax ; <- restore HRESULT  
   .exit:
    ret
   endp    
 
  As a result you will get an object with implemented interfaces, which will work
 both as an aggregate and as a separate object. 
  You may have noticed, that to get interface vptr field offset in some class you 
 may use <ClassName>.F<InterfaceName> construct. It should be noticed that you
 can rely on it only with <ClassName>, not with object name, as there may be
 no field in object with F<InterfaceName> name due to optimization. 
  Now let me describe <E_NOINTERFACE_macro> parameter of "implement_QueryInterface".
 It is used in implementation of object that aggregates another objects. In this case
 there must be done additional IID checking in QueryInterface function. Continue with 
 example:
   
   ; suppose we have implementor of IMessage
   ; that actually does not implement it, but
   ; use aggregated object in order to
   ; be seen by clients as implementor.
   ; but IUnknown must be implemented as own
   class TMessage : IUnknown

    field  RefCount   dd    ? ; reference counter
    field  MsgObj     dd    ? ; actual implementor of IMessage
    
    ; both of these functions should be implemented
    function Destroy
    function Create

    ; from IUnknown
    virtual_function QueryInterface 
    virtual_function AddRef    = IUnknown_AddRef     
    virtual_function Release   = IUnknown_Release     

   endclass 

   ; constructor implementation
   proc TMessage.Create

    alloc_instance_mem sizeof.TMessage
    test eax,eax
    jz .exit
    dynamic_init_vptr TMessage
    ; initialize RefCount and inner object 
    mov [eax + TMessage.RefCount],1
    ; suppose inner object is created with this code
    push eax ; -> save this
    lea edx,[eax + TMessage.MsgObj] ; prepare ppvObj
    invoke CoCreateInstance,CLSID_TAggrMessage,eax,CLSCTX_INPROC_SERVER,\
                            IID_IMessage,edx
    test eax,eax 
    pop eax ; <- restore this
    jz .exit ; if S_OK - exit
    ; inner object has not been created
    ; free our object and return zero
    clsfcall IUnknown(eax)->Release()
    xor eax,eax
   .exit:
    ret
   endp
   
   ; destructor implementation
   proc TMessage.Destroy this
   
    ; free inner object
    mov eax,[this]
    clsfcall IUnknown([eax + TMessage.MsgObj])->Release()
    ret
   endp

   ; additional code for QueryInterface checking
   ; it will get just the same names as parameters
   ; QueryInterface gets
   macro TMessage_AggregateCheck this, riid, ppvObj
   {
    ; this code will be executed only if no interface
    ; will be found by main QueryInterface implementation code 

    ; check inner aggregate 
    mov eax,[this]
    clsfcall IUnknown([eax + TMessage.MsgObj])->QueryInterface([riid],[ppvObj])

    ; do not jump anywhere, but in case of many such aggregates
    ; you should check result of QueryInterface and if it results
    ; success - use "jmp .exit" instruction
    ; last one such QueryInterface need not be followed by such
    ; checking and jump  
   }

   ; use this macro in automatic implementation.
   ; code from macro will be injured to automatic 
   ; implementation
   implement_QueryInterface TMessage >> TMessage_AggregateCheck 

  The result of this example will be an object, which will be seen
 as "IMessage" implementor. Let me say some words about macro 
 "TMessage_AggregateCheck" which is used as additional code in 
 "TMessage.QueryInterface". When implementing code in such macro
 stack frame is done and "this, riid, ppvObj" are "ebp + 8, ebp + 12,
 ebp + 16" values. There will be ".exit" label in autoimplementation
 and you should use jump at this label in case of more than 1 aggregates 
 checking when aggregate's QueryInterface is successful. Last one
 QueryInterface should not be followed by checking result and jump. 
  

--------------------------------------------------------------

  To summarize, let me say great thank to Tomasz Grysztar, FASM creator 
 and developer, because it is his FASM that makes all these classes 
 possible to implement.

  Thanks and best regards.

 
